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Sign-Perturbed Sums (SPS) with Instrumental Variables 
for the Identification of ARX Systems - Extended Version 

Valerio Volpe^ Balazs Cs. Csaji§ Algo Care* Erik Weyer** Marco C. Campi^ 


Abstract — We propose a generalization of the recently devel¬ 
oped system identification method called Sign-Perturbed Sums 
(SPS). The proposed construction is based on the instrumental 
variables estimate and, unlike the original SPS, it can construct 
non-asymptotic confidence regions for linear regression models 
where the regressors contain past values of the output. Hence, it 
is applicable to ARX systems, as well as systems with feedback. 
We show that this approach provides regions with exact 
confidence under weak assumptions, i.e., the true parameter is 
included in the regions with a (user-chosen) exact probablUty 
for any finite sample. The paper also proves the strong 
consistency of the method and proposes a computationally 
efficient generalization of the previously proposed ellipsoidal 
outer-approximation. Finally, the new method is demonstrated 
through numerical experiments, using both real-world and 
simulated data. 


I. Introduction 

Estimating parameters of partially unknown systems based 
on observations corrupted by noise is a classic problem in 
signal processing, system identification, machine learning 
and statistics [7], [13], [14], [15], [18]. Many standard meth¬ 
ods are available which perform point estimations. Given an 
estimate, it is an intrinsic task to evaluate how close the 
estimated parameter is to the true one and such evaluation 
often comes in the form of a confidence region. Confidence 
regions are especially important for problems where the 
quality, stability or safety of a process has to be guaranteed. 

The Sign-Perturbed Sums (SPS) method was presented in 
[2], [4], [20], [12]. Implementations of the method based 
on interval analysis have been proposed in [9], [10], [11], 
and an application of the method under a different set of 
assumptions has been presented in [16]. The main feature 
of the SPS method is that it constructs confidence regions 


which have an exact probability of containing the system’s 
true parameter based on a finite number of observed data. 

The SPS method of [4] and [20] provides exact confidence 
regions for the true parameter only when the regressors are 
exogenous (i.e., they do not depend on the noise terms), 
which is not the case with ARX systems, or, e.g., when 
feedback is involved. Generalizing the method to the case 
where the regressors can depend on the noise terms is of 
high practical importance. 

In [2] an SPS method which deals with ARX systems 
has been given, and even more general systems have been 
considered in [12], [3]. However, these extensions introduce 
complications in the simple algorithm of [4] and [20], which 
make the method more challenging to analyze and more 
difficult to implement and run. In this paper we follow an 
alternative path, and show that an instrumental variables 
approach allows for notable simplifications in the algorithms. 
This leads, on the one hand, to computationally tractable 
methods for building regions and, on the other hand, to easy- 
to-prove, and quite general, strong consistency results. 

The paper is organized as follows. In the next section we 
state the problem setting and our main assumptions. Then, 
the generalization of the SPS algorithm is presented in Sec¬ 
tion]^ and in Section IV we illustrate the theoretical proper¬ 
ties of the constructed confidence regions. Subsequently, we 
give a simplified construction by way of an outer ellipsoidal 
approximation algorithm similar to that developed in [4] for 
the case of exogenous regressors. Finally, in Section [Vl] we 
show two applications of the generalized SPS algorithm with 
numerical experiments, using both real-world and computer 
generated data. The proofs can be found in the appendices. 


The work of B. Cs. Csaji was supported by the Hungarian Scientific 
Research Fund (OTKA), pr. no. 113038, and by the Janos Bolyai Research 
Fellowship of the Hungarian Academy of Sciences, pr. no. BO/00683/12/6. 
The work of A. Care and E. Weyer was supported by the Australian 
Research Council (ARC) under Discovery Grant DP130104028. The work 
of M. C. Campi was partly supported by MIUR - Ministero dellTstruzione, 
dell’Universita e della Ricerca. 

V. Volpe^ and M. C. Campi^ are with Department of Information 
Engineering, University of Brescia, Via Branze 38, 25123 

Brescia, Italy; (email: v. volpe@student i . unibs . it, 
marCO.campi@unibs.it) 

B. Cs. Csaji§ is with Fraunhofer Project Center at the Institute for 
Computer Science and Control (SZTAKI), Hungarian Academy of Sci¬ 
ences (MTA), Kende utca 13-17, Budapest, Hungary, H-1111; (email: 
balazs.csaji@sztaki.mta.hu) 

A. Care* and E. Weyer** are with Department of Electrical and 
Electronic Engineering, Melbourne School of Engineering, The University 
of Melbourne, 240 Grattan Street, Parkville, Melbourne, Victoria, 3010, 
Australia; (email: {algo . care, ewey}@unimelb. edu . au) 


11. Problem setting 

This section presents the linear regression problem and 
introduces our main assumptions. 

A. Data generation 

The data are generated by the following system 

Yt^^j9*+Nt, ( 1 ) 

where Yt is the output, Nt is the noise, tpt is the regressors, 
and t is the discrete time index. Parameter 6* is the true 
parameter to be estimated. The random variables Yt and 
Nt are real-valued, while ipt and 0* are d-dimensional real 
vectors. We consider a finite sample of size n which consists 
of the regressors yji,..., and the outputs Ti,..., V„. 

In addition, we assume that a set of instrumental variables 
{'0i}"=i available to the user. The terms in the sequence 



must be correlated with the data and independent of the 
noise. Typically, past or filtered past inputs are used as 
instrumental variables. 


B. Examples 

There are many examples in signal processing and control 
of systems taking the form of Q, see [13], [18]. An 
important example is the widely used ARX model 

d\ d2 

Yt=J2 + £ KUt-r + Nt 

i^l i^l 

where (pt = [Yt-i,... ,Yt-di,Ut-i ,..., Ut-d^V consists of 
past outputs and inputs, and the true parameter 6* G 
is the vector [a*,..., An instrumental 

variables sequence {V't} can be easily obtained from the 
data. In particular, the instrumental variables vector can be 
constructed from the regressor pt by replacing the (noise- 
dependent) outputs with some other variables, such as de¬ 
layed inputs, or noise-free reconstructed output terms, that 
can be computed using a guess of the true system parameter. 
The latter approach, in particular, is used and showed in 
Section |Vl] 


C. Basic assumptions 

Our assumptions on the regressors, the instrumental vari¬ 
ables and the noise are: 

A1 is a sequence of independent random variables. 

Each Nt has a symmetric probability distribution about 
zero. 

A2 det(I^) ^ 0 almost surely, where 


Vn= - ^ Ip- 


wJ- 


A2 


implies that matrix ^ V't'*/': 


T ; 


IS 


Note that 

(almost surely) invertible. 

Like the SPS of [4] the assumptions are rather mild, since 
there are no moment or density requirements on the noise 
terms, and their distributions can change with time and need 
not be known. The strongest assumption on the noise is that 
it forms an independent sequence, but it can be somehow 
relaxed with the suitably modified Block SPS [4]. The core 
assumption is the symmetricity of the noise. Many standard 
distributions satisfy this property. These weak requirements 
make the method widely applicable. 


III. Sign-Perturbed Sums with instrumental 

VARIABLES 

In this section we introduce the generalization of SPS 
using instrumental variables. 


A. Intuitive idea 

First, recall that the instrumental variables estimate On 
comes as the solution to a modihed version of the normal 
equations, i.e.. 




and the instrumental variables (IV) estimate is 

( n \ n 

^'<Ptipj] ^iptYt- 

i=l / t=l 

Then, referring to the same ideas as in [4] for the construc¬ 
tion of the SPS method, we can build m — 1 sign-perturbed 
versions of equation (|^, and define the sign-perturbed sums 
as 

1 ^ 

S,[e) 4 _ V ^Pta,d{Yt 

i G {1,..., m — 1}, where HpJ'^ is the principal square root 
of Hn, which is introduced in order to give a better shape 
to the confidence regions, and {cti t} are i.i.d. Rademacher 
variables, i.e., they take on the values ±1 with probability 
1/2 each. Also, without applying sign-perturbations, we can 
define the reference sum as 

n 

St,{e)^Hf"^-y^^Pt{Yt-pJe). 

An important property of these functions is that corre¬ 
sponding to 6 — 6* we have 

1 ^ 

S'o(r) = 

n 

1 . n 1 1 ^ 

S^{e*) = iJ-s - y ±pjtNt, 

n n 

t=i t=i 

and such variables are uniformly ordered, i.e., once the values 
of {||S'i(0*)p}^g^have been sorted according to a particular 
strict total order, any ||S'i(0*)|p has the same probability 
of being ranked in a given position (see Appendix A). This 
observation is crucial to SPS since it builds the confidence 
regions by excluding those 6 for which ||S'o(0)|P is among 
the q largest ones, and the so constructed confidence set has 
exact probability 1 — q/m of containing the true parameteiQ 
Moreover, when ||0' — 0*|| is large ||S'o(0')||^ tends to be 
the largest of the m functions. Therefore, defining tt as a 
random permutation of the set {0,..., m — 1} and the strict 
total order b}0 

Zj Zk {Zj > Zk) V {Zj = Zk A 7r(j) > 7r(fc)), 

where Zi = ||S'i(0')|p, it happens that values far away from 
0* are excluded from the confidence set. 

B. Formal construction of the confidence region 

The pseudocode of the generalized SPS algorithm is pre¬ 
sented in two parts. The initialization (Table sets the main 
global parameters and generates the random objects needed 
for the construction. In the initialization, the user provides 
the desired confidence probability p. The second part (Table 
[II| l evaluates an indicator function, SPS-Indie at or (0), which 

'Notice that many q and m pairs give the same ratio q/m. Refer to [4] 
for more discussion on the choice of q and m. 

^The random permutation tt is used to break ties in case two different 
||5'i(0^)|P variables take on the same value. 


( 2 ) 



determines if a particular parameter 0 is included in the 
confidence region. 

Pseudocode: SPS-Initialization 

1. Given a (rational) confidence probability p G (0,1), 
set integers m > q > 0 such that p = 1 — qjm', 

2. Calculate the outer product 

n 

1 /2 

and find the principal square root Hn , such that 

_ TJ • 

3. Generate n{m — 1) i.i.d. random signs {cii,*} with 

= 1) = P(Q!i.t = -1) = 5, 
for i e {1,..., m — 1} and t G {1,..., n}; 

4. Generate a random permutation tt of the set 
{0,..., m — 1}, where each of the ml possible 
permutations has the same probability 1 /( 771 !) 
to be selected. 

TABLE I 


Pseudocode: SPS-Indicator(6») 

1. For the given 6, compute the prediction errors 
for f G {1,.. ., 77} 

eti0) 4 Yt-pje- 

2. Evaluate 

_i n 

Sm = Hn~^hT. 

t=i 

for 7 G {1,..., 777 — 1}; 

3. Order scalars {|!S'i(6*)|p} according to >-p, 

4. Compute the rank TZ{9) of ||S'o(0)|P in the ordering 
where TZ{0) = 1 if ||S'o(0)|P is the smallest in the 
ordering, TZ{9) = 2 if ||5'o(0)|p is the second small¬ 
est, and so on; 

6. Return 1 if TZ{9) < m — q, otherwise return 0. 

TABLE II 

Using this construction, we can define the p-level SPS 
confidence region as follows 

0„ 4 {6» G :SPS-Indicator(6') = l} . 

Note that, corresponding to the instrumental variables 
estimate it holds that So{9n) = 0. Therefore, with 
exception of pathological cases, is included in the SPS 
confidence region, and the set is built around 0„. 


IV. Theoretical results 

A. Exact confidence 

The most important property of the SPS method is that 
the generated regions have exact confidence probabilities for 
wy finite sample. The following theorem holds. 

Theorem 1: Assuming ED and |A2) the confidence proba¬ 
bility of the constructed confidence region is exactly p, that 
is, 

V{9* G 0„) = 1 - E = p. 

777 

The proof of the theorem, which is along the lines of the 
proof of Theorem 1 of [4], can be found in Appendix A. 
Since the confidence probability is exact, no conservatism 
is introduced. Moreover, the statistical assumptions imposed 
on the noise are rather weak. Indeed the noise distribution 
can change during time, and there are no moment or density 
requirements whatsoever. 

B. Strong consistency 

An important aspect of the confidence region is its size. 
Clearly for any finite sample the size of the region depends 
much on the statistical properties of the noise. However, we 
show that asymptotically the SPS regions become smaller 
and smaller, shrinking to the true parameter. Indeed the SPS 
algorithm is strongly consistent, under the following (rather 
mild) assumptions. 

A3 There exists a positive definite matrix H such that 
lim Hn — H, almost surely. 

n—>-oo 

A4 There exists an invertible matrix V such that 


lim Vn = V, almost surely. 

n—>-oo 

AS (regressor growth rate restriction): 


E 

t=i 


m\ 




< 00 , almost surely. 


A6 (instruments growth rate restriction): 


E 


Utr 


< 00 , almost surely. 


A7 (noise variance growth rate restriction): 


E 


m? 


< 00 . 


The following theorem holds. 

Theorem 2: Assuming Al[ A2 A3 A4 A5 A6 and A7 


Ve > 0 there almost surely exists an N such that Vn > 
iV,0n c {6»gK‘^ : ||6»-6»*|| <e}. 


The proof of the theorem can be found in Appendix B. The 
claim states that the confidence regions {0„} will eventually 
be included (almost surely) in any norm-ball centered at 9* 
as the sample size increases. Although the regions generated 
by the generalization of SPS introduced in this paper have no 
theoretical guarantee of being bounded, they normally are, 
and, moreover, the strong consistency result implies that they 
are bounded with probability 1 asymptotically. 






















V. Ellipsoidal approximation algorithm 

The purpose of the SPS-Indicator function is to check 
whether a given 6 belongs to the confidence region or not. 
In particular, it computes the {|iS'i( 6 *)|p}^Q^ functions for 
that specific 9 and compares them. This way the SPS region 
can be constructed by decomposing the space of interest in 
a grid, possibly very dense, and checking whether the points 
in the grid belongs to the region. However, this approach is 
computationally demanding, and it gets slower and slower as 
the dimensions increase. Here, we introduce a generalization 
of the ellipsoidal outer approximation algorithm previously 
introduced for the SPS of [4], [20]. The algorithm leads to 
an ellipsoidal over-bound that can be efficiently computed in 
polynomial time. 


A. Ellipsoidal outer approximation 

Expanding ||S'o( 6 *)|P we find that it can be written as 


|5o(0)f = 


n ” 1 

-^MYt-yyl9) 

71 

T r 

Hk 

n 

T r 

Hk 








Then, since we are looking for an ellipsoidal over-bound, 
we can ignore the random ordering used when ||S'o( 0 )|p and 
||S'i( 0 )|p take on the same value, and just consider the set 
given by those values of 9 at which q of the ||S'i(0)|p are 
larger or equal to ||S'o( 0 )|p, i.e. 

0„ c {0 e : (0 - 9^fvjH-^Vn{9 - 0„) < r{9)} , 

where r{9) is the gth largest value of functions {||<5'i(0)|p}, 

i = 1,..., m — 1. 

The idea is to find an over-bound by replacing r{9) 
with a parameter independent r, thus obtaining an outer 
approximation that is a guaranteed confidence region for 
finitely many data points. Moreover, since it is described 
in terms of 9n,Vn,Hn and r, it comes with a compact 
representation. 


B. Convex programming formulation 

Comparing ||S'o(0)|P with one single ||S'i(0)||^ function, 
we have 


where matrix Qi and vector pi are defined as 


1 j T 

Qi ~ ^ 'y ^ > 

n 

t=\ 

1 " 

Pi = - 
n 

t=i 


Eirst, observe that it holds that 


sup ||-S'*(6»)f = sup ||5'o(6')f. 

e:||So(e)lP<||Sde)IP e:||So(e)IP<l|s,(e)|p 

Such supremum is finite only if the matrix Hf^Vn — 
is positive semidefinite. If this is the case, 
we want to compute such maximum. Thus, defining z = 
Hn ^Vn{9 — 9n), Can find the quantity 


max 

e:||So(e)IP<l|Sde)IP 


as the solution of the following quadratic programming 
problem with only one quadratic constraint 

maximize \\z\\'^ 

subject to z'^AiZ -f 2z'^bi -f < 0 , 


where Ai, bi and Ci are defined as 

Ai^I- 

bi ^ - qA), 

Cl = + 29lQjHf^p, - 9^QjHf^Qi9n 


This program is not convex in general, due to the fact that 
the Hessian of the quadratic constraint is not necessarily 
positive definite. However, it can be shown, [1, Appendix 
B], that strong duality holds, so that the value of the 
above optimization problem is equal to the value of its 
dual, which can be formulated as the following semi-definite 
programming problem 


minimize 7 
subject to A > 0 

—I AAj Xbj^ 

XbJ Xci + 7 


(3) 


where ‘V 0 ” denotes that a matrix is positive semidefinite. 
This program is convex, and can be easily solved in polyno¬ 
mial time using, e.g., MATLAB and a toolbox such as CVX 
[ 8 ]. 

Defining 7 * as the value of program (|^, we have 
{0 : ||5o(0)f < ||5.(0)f} C {0 : ||5o(0)f < 7*}- 


{0:||5o(0)f <||5,(0)f } 

C{0:|j^o(0)f < sup ||5.(0)f}- 

e:||So(fl)IP<l|s.(e )||2 

The inequality |jS'o(6*)|p < ||5'i(0)|p can be rewritten as 

(9 - 9nfV^ Hf^Vni9 - 9n) < 

9'^QjHf^Q,9 - 29^QjHf^p, + pj p,, 


Thus, 

0 „ C 0 „ 4 e : (0 - 9nfv^Hf^V^{9 - k) < r] , 

where r — gth largest value of 7 *, j = 1 ,..., m — 1. 

On is the outer approximation we were looking for. 
Clearly it holds that 

P(0* G 0„) > 1 - - = P, 
m 












for any finite n. The pseudocode for computing 0„ is given 
in table [111] 


Pseudocode: SPS-Outer-Approximation 

1. Compute the instrumental variables estimate 

^ / n \ ~ ^ n 

= ( E ] E '>PtYu 

\t=l J t=l 

2. For * S {1,..., TO — 1}, solve the optimization 
problem ([^, and let 7 * be the optimal value (or 
00 if the problem is infeasible); 

3. Let r be the gth largest 7 * value; 

4. The outer approximation of the SPS confidence 
region is given by the ellipsoid 

<r}. 

TABLE III 

VI. Numerical experiments 
I n this section we illustrate SPS with numerical exper¬ 
iments. Firstly, we apply the method to a simple first- 
order ARX system. Then, SPS is applied to a real-world 
identification problem, with the purpose of showing that the 
method is robust against the assumptions from which the 
guarantees provided in this paper are established. 

A. Simulation example 

We consider the following data generating ARX system 

Yt = a*Yt-i+b*Ut + Nu 

where a* = 0.7, h* = 1, and {Ut} is a sequence of random 
inputs generated as 

Ut = 0.75I/t_i + Vt, 

being {Vt} a sequence of i.i.d. Gaussian random variables 
W(0,1). {A^t} is a sequence of i.i.d. Laplacian random 
variables with zero mean and variance 1. We consider a finite 
sample of size n, that consists of couples {{YtT‘pt)}t=i- 
The instrumental variables constructed from 

the data. In particular, we replace the autoregressive com¬ 
ponents of the regressors ipt, for t = 2 ,. .., n, with recon¬ 
structed outputs. Firstly we find an estimate 0 ls of the true 
parameter via least squares on {{YtT‘Pt)}t=i^ 
use such estimat^to build the noise-free sequence {Yt}}‘^i 
using the following recursive procedure 

Yt = oYt-i + bUt, 

where 0 ls = [b, and we use Yi as initialization value. 
Finally, the instrumental variables are 

ijt = [Yt-i,Utf. 

Note that, rigorously speaking, these instrumental variables 
are not completely independent of the noise, due to the 

^We could also use a guess (even imprecise) of the true parameter coming 
from some a-priori knowledge. 


presence of the noise realization in the least squares estimate. 
However, in the noise is averaged out, so that the effect 
of the noise is toned down. If the least squares estimate were 
built from a set independent of the one used by SPS then 
the constructed regions would be rigorous. Yet, the difference 
would be minimal, thus, for the sake of simplicity, we used 
just one data set. 

Based on n = 25 data points {{Yt, (pt)}t=i we want to 
find a 95% confidence region for 9*. We build 99 sign- 
perturbed sums (to is set to 100), and the confidence region 
is constructed as the values of 6 for which at least q — 5 
of the ||S'i(0)|p, * = 1,..., 99, functions are “larger’]^ than 
||S'o(6>)|p. An example of constructed confidence region is 
illustrated in figure [T] The solid red line has been obtained 
by evaluating the SPS-Indicator(0) function in table on a 
very fine grid. 



Fig. 1. 95% confidence region, n = 25, m = 100. 

B. Real-world data experiment 

Working with real-world data is almost always a challenge. 
Usually, the user can only presume the nature of the best 
mathematical representation of the system, and most of 
the times the real system does not lie in the model class. 
Moreover, the knowledge on the noise characteristics is 
limited. All these issues make the identification process much 
more complicated. Nevertheless, we still want to apply SPS 
in such a scenario, and even though the theoretical results 
cannot be expected to hold rigorously, since, e.g., the real 
system does not lie in the model class, we hope that they 
hold approximately. 

Our real-world data set comes from the photovoltaic en¬ 
ergy production measurements of a prototype energy-positive 
public lighting microgrid (E-nGrid) system [5]. In particular, 
the available data contain the hourly historical progression 
of the amount of energy produced. 

'^According to the strict total order 'r-,,, with a random permutation tt. 













The model class is an ARX(5, 4), i.e., 

5 4 

Yt = diYt-i + biUt-i+i + Nt = ipj0 + Nt, 
i—1 i—1 

where Yt is the amount of produced energy and Ut is an 
auxiliary input given by the clear-sky predictions of the 
amount of energy produced (see [5] for more details). 

To carry out our tests, we first estimated via least squares 
a “true parameter” 9* based on the first half of the large 
(more than 4200 observations) data set available. After 6* = 
[a*, 6*]"’’ was found, the residuals et = Yt — ^lYt-i — 
nUkut -i+i were tested with the Durbin-Watson algo¬ 
rithm, [6], which returned a p-value bigger than 95% for the 
uncorrelation hypothesis, supporting the choice of the orders 
5 and 4 [19]. 

Then, SPS was used with the second half of the data set. 
The instrumental variables {i/'t} were built from the data 
by replacing the autoregressive components of the regressor 
with a reconstructed noise-independent trajectory of the 
output {Yt}, similarly to what has been done in the previous 
example. The estimate of the “true parameter” used to build 
such a sequence was obtained via least squares on an extra 
subset of data consisting of 100 samples, which was not used 
later. 

Finally, we evaluated the empirical probability with which 
0* belonged to the SPS regions that were built using many 
(1000) different data subsets, in a Monte Carlo approach. 
Each subset was constructed with pairs {{Yt,ipt)} drawn 
randomly (non-sequentially) from the second half of the 
global data set. The size of each subset varied from 75 to 
250 observations, and the parameter m, q were always set, 
respectively, to 100 and 10, looking for a region of (desired) 
confidence probability equal to 90%. 

The final results, illustrated in table show a good 
adherence between theory and empirical results. 


n 

Empirical confidence 

75 

0.886 

100 

0.900 

150 

0.886 

200 

0.906 

250 

0.910 


TABLE IV 


VII. Concluding remarks 

A new SPS algorithm has been proposed in this paper 
that, unlike the original version of SPS, can be used when 
the regressors contain past values of the system output, which 
makes it suitable for the identification of ARX systems. 
The algorithm makes use of instrumental variables (IV). 
However, it has to be noted that the reason for using 
an IV with SPS is quite different from other IV system 
identification methods. Particularly, in this version of SPS 
the IV does not counteract the presence of correlated noise, 
as it is in other IV approaches, and in fact the noise terms 
are supposed to form an independent pattern in this paper. 
Instead, the IV is introduced to ease the implementation of 


the method which is explained by noting that the IV only 
contains exogenous variables that are not affected by the 
system noise so that no noise sign perturbation is required 
in the IV when the sign-perturbed functions are constructed. 
Along an alternative approach, one may consider using 
the initial regressor ipt in place of the IV, which might 
give better shaped regions. However, this would require a 
more cumbersome implementation of the algorithm for the 
sign perturbation of the regressor, as it is done in [2]. An 
evaluation of the pros and cons of these two approaches will 
be the subject of future investigations. 

Appendix A 

Proof of Theorem[T] Exact Confidence 

We begin with a definition and some lemma^ 

Definition 1: Let Zi,..., be a finite collection of 
random variables and >- a strict total order. If for all 
permutations ii,... ,ik of indices 1,.. . ,k we have 

then we call {Zf} uniformly ordered w.r.t. order 

Lemma 1: Let a,/3i, ...,/3k be i.i.d. random signs, then 
the random variables a, a ■ j3i,... ,a ■ j3k are Lid. random 
signs. 

Lemma 2: Let X and Y be two independent, W^-valued 
and -valued random vectors, respectively. Let us consider 
a (measurable) function g : x > K and a (measur¬ 

able) set A C- M.. If we have P( g{x, Y) € A) = p, for all 
(constant) x G then we also have P( g{X, Y) G A) = p. 

The following lemma highlights an important property of the 
^ TT relation that was introduced in Section [nl| 

Lemma 3: Let Zi,...,Zk be real-valued, i.i.d. random 
variables. Then, they are uniformly ordered w.r.t. 

Proof of Theorem 

By construction, parameter 9* is in the confidence region 
if TZ(9*) < m — q. This means that ||S'o(0*)|P takes one of 
the positions 1,... ,m — q in the ascending order (w.r.t. 
of variables {||S'i(0*)p}. We are going to prove that the 
{||iS'i( 0 *)|p} are uniformly ordered, hence ||S'o( 0 *)|p takes 
each position in the ordering with probability 1/m, thus its 
rank is at most m — q with probability 1 — q/m. 

Eirst, we fix a realization of the instrumental variables, by 
conditioning on the cr-algebra generated by them, and we 
will apply the following results realization-wise since noise 
and instrumental variables are independent by definition. 

Note that for 9 = 9*, all Si{-) functions have the form 

1 ^ 

S,(9*) = 

for alH € { 0 ,..., m — 1 }, where ao,t = 1 , ^ € {1 ,..., n}. 

^For the proofs of the lemmas refer to [4]. 











Therefore, all the Si{-) functions depend on the perturbed 
noise sequence, {ai^tNt}, via the same function for all i, 
which we denote by S{ai^iNi ,..., ai^n^n) — Si{9*). 

Since each Nt is symmetric, sign(iVi) and \Nt\ are 
independent. Then, for all i and t, we introduce 7^ 4 = 
sign(iVt), and since {a^t} are i.i.d. random signs, also 

t are i.i.d. random signs (Lemma [^. Moreover, they are 
independent of {|iVt|}. 

After fixing a realization of {|A^j|}, called {vt}, we define 
the real-valued variables {Zi} by 

= \\S{ji,iVi, ... 

Such {Zi} are i.i.d. random variables, and, in view of 
Lemma 3, they are uniformly ordered with respect to >~Tr. 

So far we have proved the theorem assuming that the ab¬ 
solute values of the noises are constant, namely, the uniform 
ordering property was achieved by fixing a realization of 
{|A^t|}. However, the probabilities obtained are independent 
of the particular realization of {|A^t|}, hence. Lemma ^ can 
be applied to relax fixing the realization (i.e., in Lemmal^ X 
plays the role of {|A^t|} and Y incorporates the other random 
variables), and obtain the unconditional uniform ordering 
property of {||S'i(0*)|p}, from which the theorem follows. 
□ 


Appendix B 

Proof of Theorem|2] Strong Consistency 
We will prove that for any fixed (constant) 9' f 9*, 
||S'o(6»')|P ^ {9* - 9')'^V'^H-^V{9* - 9'), which is 
larger than zero (using the strict positive definiteness of 
H, i.e., I A3 1 and the invertibility of V, i.e., |A4| l, while, for 
i f 0, |jS'i(6*')|p 0, as n —>■ 00. This implies that, as n 

grows, ||S'o(0')|p will be ranked as the biggest element in the 
ordering, and therefore 9' will (almost surely) be excluded 
from the confidence region as n 00. As done in the proof 
of Theorem [T] we derive the results for a fixed realization of 
the instrumental variables. Since instrumental variables and 
the noise Nt are independent, the obtained results hold true 
on the whole probability space (almost surely). 

Using the notation 9 — 9* — 9', So{9') can be written as 

1 ^ 

So{9') = Hf'^-y^Myt-‘pJd') 

1 . n 1 1 ^ 

= Ipt(pj9 + ftNt. 

t=i t=i 


The two terms will be analyzed separately. The convergence 
of the first term follows directly from A3 A4 and by 
noticing that (•)2 is a continuous matrix function. Thus, 


1 1 * ^ 

ipt^pJS = Hf 

n 

t=i 


Vr,9 


H-^V9, 


as n —>■ 00. 


The convergence of the second term will now be proved from 
the component-wise application of the Kolmogorov’s strong 
law of large numbers (SLLN) for independent variables, [17]. 
Observe that ' } is a convergent sequence, just as 

{ 14 .}, so that for our purpose we only need to prove that 


the other part of the product goes to zero (a.s.). By using the 
Cauchy-Schwarz inequality, |A 6 | and |A7[ we have 


+2 - + + 




< 






m 




mi? 

f 2 


< 00 . 


Hence, the Kolmogorov’s condition holds true and it holds 
that (SLLN) 


1 1 ' ^ 

iptNt 

n 2 -^ 


0 , as n —>■ 00. 


Using the two results we obtain 

||5'o(6»')f ^ {9* - 9'fV'^H-W{9* - 9') > 0 , 

since V is full rank, so that V^H~^V is positive definite. 
Now, we investigate the asymptotic behaviour of Si(9'), 


S^{9') = 5 - V - <^19') 

n 

t=i 

1 ^ n 1 1 ^ 

= a.t-ftNf 

n 2 -^ n 2 -^ 

t=i t=i 

Again, we will inspect the asymptotic behaviour of the two 
terms separately. The convergence of the second term follows 
immediately from our previous argument, since the variance 
of ai^t'ftNt is the same as the variance of iptNt. Thus, 


1 1 ^ 

Hn'^ -'Y ai^ttptNt 


0, as n —>■ 00. 


t=i 


For the first term, since {iT„ is convergent and 9 is 
constant, it is enough to show that 
converges almost surely to 0 for each j and k. In order to do 
that, we fix a realization of the noise, so that 
becomes a sequence of (conditionally) independent ran dom 
variables with (conditional) covariances k- From 

and |A61 


A5 


g \^Yt]lu 


f 2 




f 2 


giiv-tr 


Therefore, using the SLLN 
1 


g \Wtr 


f 2 


< 00 . 




ai^t'<PtAt^ 0 ) as n —>■ 00 


t=i 


holds true for (almost) any noise realization, and therefore 
holds true unconditionally. 

Now, since |i5o(6»')lP A* - 9'AV'^H-Y{9* - 9') 
and II5^(0')IP 0 , i 7 ^ 0 , we show that eventually the 

confidence region will (a.s.) be contained in a ball of radius 
£ around the true parameter, 9*, for any positive £. 
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From the previous results, we know that the event that for 
each i G {0 ,...,TO — 1} the functions |jS'i(0')|P converge 
is a set of probability 1. Fix an outcome from this set, and 
define 
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T 




n ^^ 


t=l 


rj ^ 


Given the previous results, for each (5 > 0 there must be an 
N > Q such that for n > N (for all i ^ 0), 


\Ur.~^Vn-ll—^V\\ <(5, 
<,5, 


\\Hn ^fnll < 5, 
\Hn < 5, 


where || • |j indicates the spectral norm (if its argument is a 
matrix), i.e. the matrix norm induced by the Euclidean vector 
norm. Take n> N, then 

\\So{9')\\ = \\H~~^Vj + H~Kn\\ 

= \\{H-"^Vr,-H--2V)e+H--2ve+H-Kj 
>K.n{H-"^v)\\e\\-s\\e\\-s, 


where Aniin(-) denotes the smallest eigenvalue. On the other 
hand, we also have 

<l|7F;^r,,„||||0||+||ff;^,„||<<)||0|i+<). 


We have ||S'i(0')|| < ||S'o(0')|| for all 6' that satisfy 

<5|!0l + <5 < X^^iH-iv)\\9\\ - 5|i0|| - 


which can be rewritten as 


no{S) ^ 


26 

XmUH--^V)-2S 


<11^11, 


therefore, those 9' for which < ||6** — 9'\\ are not 

included in the confidence region 0„, for n > N. Finally, 
by setting 6 := {eXmin{H~ 2 V ))/{2 + 2e) we can prove the 
statement of the theorem for any positive e. □ 
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